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ABSTRACT 

The existence of a correlation between observed radio spectral index and redshift has 
long been used as a method for selecting high redshift radio galaxy candidates. We 
use 9 highly spectroscopically complete radio samples, selected at different frequencies 
and flux limits, to determine the efficiency of this method, and compare consistently 
observed correlations between spectral index (a), luminosity (P), linear size (D) and 
redshift (z) in our samples. We observe a weak correlation between z and a which 
remains even when Malmquist bias is removed. The strength of the z-a correlation 
is dependent on both the k-correction and sample selection frequency, in addition to 
the frequency at which a is measured, and consistent results for both high and low 
frequency selected samples are only seen if analysis is restricted to just extended radio 
galaxies. This fits with the popular interpretation that the spectra steepen with z 
because the radio lobes work against a denser IGM environment as z increases, out 
to z^2-3. However we also note that the majority of sources known at z>4 are very 
compact and often display a negatively curved or peaked spectrum, indicative of youth 
or merger activity, and therefore the low frequency radio spectrum as a whole should 
be determined; this is something for which the new LOw Frequency ARray will be 
crucial. We quantify both the efficiency and the completeness of various techniques 
used to select high-z radio candidates. A steep-spectrum cut applied to low-frequency 
selected samples can more than double the fraction of high-z sources, but at a cost of 
excluding over half of the high-z sources present in the original sample. An angular size 
cut is an almost as equally effective radio-based method as a steep-spectrum cut for 
maximising the high-z content of large radio samples, and works for both high and low 
frequency selected samples. In multi-wavelength data, selection first of infrared-faint 
radio sources remains by far the most efficient method of selecting high-z sources. We 
present a simple method for selecting high-z radio sources, based purely on combining 
their observed radio properties of a and angular size, with the addition of the if-band 
magnitude if available. 

Key words: radio continuum: galaxies - galaxies: high-redshift - galaxies: active - 
galaxies: evolution 



1 INTRODUCTION 

Vast amounts of multi-frequency radio data at long wave- 
lengths will soon begin to flow from next generation radio 
instruments such as the LOw Frequency ARray (LOFAR) 
and eventually the Square Kilometre Array (SKA). With 
this, opportunities will arise for studying some of the earliest 
radio sources in the universe, their environments and their 



evolution over cosmic time. There is also the tantalizing pos- 
sibility of studying conditions within the Epoch of Reionisa- 
tion itself through high-z radio sources: if sufficiently bright 
radio sources can be found at redshifts greater than 6.5, 
it should be possible to measure absorption signatures of 
neutral hydrogen, and hence trace changes in t he ionisation 
state of the Univer se with cosmic time (e.g. ICarilfi et alj 
l2007l : lMeiksir]|201ll ). 
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The existence of a correlation between redshift and ob- 
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served spectr al index o[j] for power ful radio sources was first 
suggested by iTielens et alj (| 19T9T ) who found that the in- 
troduction of a steep spectral index cut led to an increas- 
ing fraction of sources without opti cal counterpart s iden - 
tified on POSS-/ (R < 20) plates. iBlundell et all || 19991 ) 
used the combinat ion of c omplete samples from the 3CRR 
Laing et al.lll983h. 6CE dRawlings et alj|200ll ) and 7CRS 



Willott et alj 120031 ; lLacv et ; al j 1 19991 ) surveys to confirm 



that the high frequency (5 GHz) rest-frame spectral index 
correlates with redshift, but showed that this correlation is 
weaker for spectral index measured at lower rest-frame fre- 
quencies. Since then, there have been many surveys designed 
to pick out only Ultra Steep Spectrum (USS) sources for 
further infra-red imag i ng and spectroscopic follow-up (e.g. 
RottEcrmg et alJll99d : lDe Breuck et alJl2000h . having vary- 
ing degrees of success selecting high-z sources. Many of these 
have additional selection criteria such as small angular size 
and faint infrared magnitude applied after the USS cut, 
which makes it difficult to determine the extent to which 
the USS cut is responsib le for selecting high-z sources. 

iKlamer et alj (|200rJ ) present a sample of USS selected 
galaxies selected from the Sydney University Molonglo Sky 
Survey (SUMSS), and discuss the apparent mechanisms for 
the z-a correlation in detail. They dismiss the possibility 
of k-corrections being the cause of the observed steepening 
radio spectra, given that the majority of their radio spec- 
tra show no evidence for curvature. They suggest that en- 
hanced spectral aging due to inverse Compton losses against 
the Cosmic Microwave Background (CMB) at high redshift 
is the most likely origin for the observed z-a correlation, or 
that it may arise due to an intrinsic relation between low fre- 
quency a and radio luminosit y coupled to a Malmquist bias. 
Following on from this work. iBrvant et "ail (2009) compare 
the median redshift of several complete samples with the 
median redshift obtained from USS selected samples. They 
find it to be lower in complete samples, and argue that this 
is strong evidence for the efficiency of the USS technique. 
However whilst USS samples clearly do select higher red- 
shift sources, sample comparisons between USS and com- 
plete samples are not ideal for either optimising or quan- 
tifying the efficiency of the technique. The most rigorous 
approach would be to apply selection criterion to complete 
samples with significant numbers of sources at the highest 
redshifts currently known, and quantify the number of high 
redshift sources included/missed. 

Despite the wide use of the z-a technique to select 
high redshift galaxies, th ere has been very little work on 
quantifying the efficiency. Pedani (2003) states that, for the 
first time, they present the true quantitative searching effi- 
ciency for high-z radio galaxies using a sample selected from 
the Molonglo Reference Catalogue (MRC). They utilise 225 
sources with full redshift information from this sample to 
measure the efficiency of optical, USS and size selection. 
They find that the efficiency (defined as the fraction of z>2 
sources in the recovered sample) of the USS criterion alone 
is 0.33, increasing to a maximum of ~0.59 in combination 
with an optical cut. However, their 225 source sample is not 
complete, being composed of only objects with redshift in- 
formation amongst the complete MRC 1 Jy radio sample 
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of 446 sources. They argue that the redshift-complete sub- 
sample is representative of the full sample, as both contain 
similar proportions of USS sources. However they also note 
that the median magnitude of the subset of galaxies with- 
out redshifts is fainter than that for those included. This 
means firstly, that there is an optical magnitude bias to- 
wards brighter magnitudes in the analysed sample, and sec- 
ondly, that the work is based on the implicit assumption 
that the USS criterion is more important than optical mag- 
nitude in selecting high redshift candidates. With 50% of 
the sample not analysed, and at fainter optical magnitudes, 
the redshift incompleteness towards higher redshift cannot 
be quantified, and this could be substantial. 

Potential biases such as these are common in the lit- 
erature, due to the difficulty and expense of building spec- 
troscopically complete radio samples. As such, any attempt 
to use large collections of radio data available in the litera- 
ture to investigate evolution of basic radio properties is not 
valid, despite the large number statistics, without clearly 
defined and wel l understood selection criteria. For ex ample, 
recent work bv lKhabibullina fc Verkhodanovl (2009) uses a 
large sample of 2442 radio galaxies with measured redshifts 
selected from large publically available radio source cata- 
logues. They determine the dependence of a on z, and se- 
lect a sample of distant objects using this relation. Cru- 
cially however, as they note, the samples they use are not 
complete in any sense, and some of the largest high-z radio 
source samples with radio spectra public ally available are 
ones with an USS criterion applied (e.g. |Pe Breuck et all 
|2000|) which will irrevocably bias spectral index studies of 
any sample contructed from them. 

In summary, although the existence of the so-called z-a 
correlation has been known for some time, there has been 
little attempt to quantify the strength of this consistently 
across a wide range of spectroscopically complete samples 
at different selection frequencies, and measure the resultant 
efficiency of using an USS a cut-off in order to isolate high- 
z candidates. In this study, we address these shortcomings, 
thus providing a vital tool for the design of further high-z 
source searches from upcoming radio surveys by new survey 
instruments, e.g. LOFAR. This work builds significantly on 
current knowledge in five ways: 

• We use nine highly spectroscopically complete and un- 
biased radio source samples. Most have a spectroscopic com- 
pleteness in the range 80-100%, and robust redshift esti- 
mates (e.g. photometric or based on the K-z relation) are 
available for the vast majority of the remaining sources, such 
that all samples are at least 95% redshift-complete. 

• We use new radio data from the CENSORS radio sam- 
ple (|Best et al.l 120031 ') . which contains a large number of 
sources with z > 2, improving high redshift statistics. 

• Selection frequency effects are fully explored - four sam- 
ples are selected at frequencies below 200 MHz, and five at 
1.4 GHz. 

• The samples have a wide range of flux density limits, 
so that correlations such as the P-a and z-a relations may 
be safely disentangled. 

• We also consider radio linear size (D) in order to inves- 
tigate its role in selecting high redshift sources. 

The layout of this paper is as follows. In Section 2 we 
describe the complete radio samples used in this study. In 
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Figure 1. Radio luminosity vs redshift plane coverage for the low frequency selected samples 3CRR, 6CE, 7CRS and TOOTS (left) and 
for the high frequency selected samples WP85r, PSRr, CoNFIG 1 & 2r, CENSORS and Hercules (right). Only sources with a < —0.5, 
as used in this study are shown. The boxes indicate the Malmquist-bias-free sections of the P-z plane at high and low frequency, which 
we use in later analysis, for 151 MHz samples: logP=27.75-29.5, z=0.5-3.5, and for 1.4 GHz samples: logP=26. 25-29.0, z=l. 0-4.5. 



Table 1. Details of the complete samples used in this study. Note that the spectral indices for CoNFIG are taken from Gendre et al. 
(2010), from a linear fit to flux densities between 1.4 GHz and 178 MHz, rather than a two point spectral index. 



Survey 


Selection v (MHz) 


no. of Sources 


Sky Area (sr) 


Flux Limit (Jy) 


% Aspect 


% no z 


a range (MHz) 


3CRR 


178 


173 


4.239 


S 178 >10.9 


100 


0.0 


750-151 


6CE 


151 


58 


0.102 


2.00<Sisi <3.93 


97 


3.0 


1400-151 


7CI 


151 


.37 


0.0061 


Sisi >0.51 


90 


0.0 


1400-151 


7CII 


151 


37 


0.0069 


Sisi >0.48 


90 


0.0 


1400-151 


7CIII 


151 


54 


0.009 


S151 >0.50 


95 


0.0 


1400-151 


TOOTS-00 


151 


47 


0.0015 


Sisi >0.10 


85 


2.0 


1400-151 


WP85r 


1400 


138 


9.81 


S1400 >4 


95 


0.0 


5000-1400 


CoNFIGl 


1400 


273 


1.5 


S1400 >1.3 


83 


3.6 


1400-178 


CoNFIG2r 


1400 


61 


0.89 


1.0<Si4oo <1.3 


52 


1.9 


1400-178 


PSRr 


1400 


59 


0.075 


Si400 >0.36 


(51 


0.0 


2700-1400 


CENSORS 


1400 


135 


0.0018 


Si4oo >0.0072 


78 


3.7 


1400-325 


Hercules 


1400 


64 


0.00038 


Si4oo >0.002 


66 


3.0 


1400-610 



Section 3 we give a brief summary of radio source proper- 
ties and sample selection effects. In Section 4 we investigate 
observable trends and employ principle component analysis 
to identify fundamental correlations in the PDaz parameter 
space for various collections of samples. In Section 5 we at- 
tempt to fit various functional forms for a to the observed 
data, and identify large intrinsic scatter in a, not dependent 
on P, z or D. In Section 6 we discuss the physical origins 
of the observed z-a correlation, and finally in Section 7 we 
discuss the implications of our findings in the search for the 
highest redshift radio galaxies, and use complete samples to 
explore the efficiency of often used techniques in the litera- 
ture to find these. 

Throughout this work a ACDM cosmology is assumed, 
JIa = 0.7, £Im = 0.3 and Ho=70 km s" 1 Mpc" 1 , and mag- 
nitudes are in the Vega system. 

2 COMPLETE RADIO SAMPLES SELECTION 

We want to quantify the z-a correlation at a wide range of 
frequencies and flux density limits, determine to what extent 
this is an intrinsic property of sources (rather than being 
driven by, for example, a P-q correlation) and understand 
any selection effects present. In order to do this, we collate 



data from several complete samples already available in the 
literature: the 3CRR, 6CE, 7CRS and TOOTS-00 selected 
at low frequency, and the WP85r, CoNFIGl&2, PSRr, CEN- 
SORS, and Hercules samples selected at high frequencies. 
These samples are described below, summarised in Table [T] 
and displayed on the P-z plane in Figure [1] 

2.1 The 3CRR Sample 

The 3CRR, or Third Cambri dge Revised Revi sed sample 
of extragalactic radio sources (|Laing et al.lll983T l. is a com- 
plete sample containing all radio sources above 10.9 Jy at 
178 MHz in an area of sky covering 4.239 sr. The sample 
comprises 173 objects in total, and is 100% spectroscopi- 
cally complete. The data were obtained from the 3CRR cat- 
alogue webpage maintained onlinfl As this sample is the 
only low frequency selected sample observed at 178 MHz as 
opposed to 151 MHz, the flux densities are converted to 151 
MHz fluxes assuming the spectral indices given in the cat- 
alogue. The observed spectral index a is measured between 
750 MHz and 178 MHz for this sample. 

2 http : //3crr. extragalactic . info/ 
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2.2 The 6CE Sample 

The 6CE sa mple is based on a original sample selected by 
lEalesI (|l985l ) from the Sixth Cambridge radio survey (6C) 
comprising of 67 radio sources between 2.2 Jy and 4 Jy se- 
lected at 151 MHz, over a sky area of 0.102 sr. For this study, 
we use a resele cted, updated version, available onlinfl the 
6CE sample of Rawlings et al.l (|200lh . This consists of all 
sources with a 151 MHz flux density in the range 2.00 ^ 
S151 ^ 3.93 Jy in the same 0.102 sr patch of sky. There are 
59 sources in total, with all but one having a firm identifi- 
cation, and 56 of the 59 having spectroscopic redshifts. Of 
the three sources without spectroscopic redshifts, one is ob- 
scured by a bright star and so is excluded from the sample, 
and the other two have a redshift estimate from the K-z 
relation. Observed spectral indices have been calculated be- 
tween 1.4 GHz and 151 MHz using 1.4 GHz fluxes obtained 
from the NVSS. 



2.4 The TOOTS-00 Sample 

The TOOT00 region, |Vardoulaki et al.ll201oh . is the first 
complete region of the Tex-Ox-1000 redshift survey of radio 
sources. This survey selects all sources above 100 mjy in 
the Cambridge 7C 151 MHz survey, and is designed to be 
approximately 5 times fainter than the 7CRS, with much 
greater numbers. IVardoulaki et all |20ld) present complete 
radio, near-infrared and spectroscopic data or redshift es- 
timates for the first region of the survey, comprising 47 
sources. 40 of the radio sources have spectroscopic redshifts, 
with a further six using a redshift estimated from the K-z 
relation. The final source has a A"-limit only and we adopt 
the lower redshift limit as the redshift estimate for this 
source (the _R"-band data reaches sufficient depth to place 
the source at high-z, and hence for the lower redshift limit 
to be adopted as the redshift estimate with little loss of accu- 
racy). The observed spectral index was calculated for each 
sour ce using flux data at 15 1 MHz and 1.4 GHz (NVSS) 
from lVardovilaki et all |2010l ). 



2.3 The 7CRS Sample 

The 7CRS, or Seventh Cambridge Redshift Survey is com- 
posed of three subsamples, 7CI, 7CII and 7CIII. 7CI and 
7CII are each composed of 37 sources with flux density lim- 
its of S151 ^0.51 Jy and S151 ^0.48 Jy respect ively in the 7C 
survey and are defined in IWillott et all (2003). The redshifts 
and linear sizes for the 7CI fe 7CII samples are available 
online from the data of iGrimes et~ai1 l|2004[ fl The 7C-III 
sample contai ns 54 objects with a flux limit of S151 >0.50 
Jy, detailed in lLacv et al. |l999t). We utilise the 7CIII data 
from Table 8 in lLacv et al.l ( 1999) to get the luminosities, 
redshifts and linear sizes of the sample. 

The spectral indices for this sample are not yet avail- 
able in a collective form in the literature. We estimate 
the observed spectral index by cross-matching 151 MHz 
fluxes for each so urce from the 7C 151 MHz catalogue of 
Hales et al l (120071) w ith the NRAO VLA Sky Survey (NVSS; 
Condon et al.l ll998T l. at 1.4 GHz. We checked all extended 
sources listed in the 7C Hales catalogue as having sepa- 
rate components, and cross checked maps at 151 MHz with 
NVSS maps in order to correctly identify components and 
catalogue the correct integrated flux for each source. We 
have also matched the source list with the TEXAS 365 
MHz/WENSS 327 MHz surveys, the 5C 408 MHz survey, 
and fin ally the VLA Low Frequency Sky Survey at 74 MHz 
fVLSS; ICohen et al.ll2007h with the ad dition of the 38 MHz 
8C survey for 7CIII (|Lacv et all 1 19991 ). all of which are of 
comparable resolution, for later curvature analysis (see Sec- 
tion 6, and Ker et al, in preparation). We note that it is 
possible that very extended sources may not have correct 
fluxes in these catalogues. 

7CI and 7CII both have 90% spectroscopic redshift com- 
pleteness, and 7CIII is 95% complete. The remaining sources 
in all three subsamples have photometric redshifts estimated 
from the K-z relation. 



http : //www-astro .physics . ox . ac . uk/~ sr/6ce .html 
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2.5 The Wall and Peacock 2.7 GHz Sample 

The original IWall fc Peacock! (|l985l ) 2.7 GHz 233 source 
sample covers 9.81 sr of sky, and includes all radio sources 
brighter than 2 Jy. T he sample now sta nds at 98% spec- 
troscopically complete (|Rigbv et al.ll201l|). In this study, we 
use the 138 source sample, reselected bv lRigbv et all j 201 if ) 
to be complete at 1.4 GHz to a flux limit of 4 Jy with a 
spectral index between 5 and 1.4 GHz steeper than —0.5. 
This re-selected sample is 97% spectroscopically complete, 
and the remaining three sources have photometric redshift 
estimates. We refer to this sample as WP85r. 

2.6 The CoNFIG Samples 

We utilise two complete samples from the Combined NVSS- 
FI RST Galaxy catalo gue (CoNFIG), ConFIG regions 1 and 
2 (|Gendre et al.ll2010l ). 

CoNFIGl contains 273 sources complete to 1.4 GHz 
1.3 Jy, and is 83% spectroscopically complete. In CoNFIG 1, 
226 sources have spectroscopic redshifts, 37 have photomet- 
ric redshift estimates and 10 sources (4%) have only lower 
redshift limits from SDSS /-band non-detections. These non- 
detections are not sufficently deep to provide a useful con- 
straint on the redshift (the SDSS limiting /-band magnitude 
only constrains each source to z^l). However of these 10 
sources, only four have an observed spectral index steeper 
than —0.5, and hence should be included in the analysis (see 
Section 3). We choose not to include these four sources as 
the redshift estimate is not reliably constrained, and such a 
small fraction will have a statistically insignificant effect on 
the results. All four sources have very different morpholog- 
ical types and spectral indices, so are unlikely to be biased 
toward any one redshift range. 

CoNFIG2 contains 132 sources and is complete between 
1.3 Jy and 0.8 Jy at 1.4 GHz (only sources with 1.4 GHz 
fluxes less than 1.3 Jy were used from CoNFIG2, to ensure 
no duplication with sources also in CoNFIGl). At fainter 
flux densities the redshift completion of CoNFIG2 is rela- 
tively low, so we reselect the sample to above 1 Jy at 1.4 
GHz, creating a new sample of 61 sources which we refer to 
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as CoNFIG2r. For this revised sample, reselecting to above 1 
Jy at 1.4 GHz reduces the proportion of unidentified redshift 
sources to a negligible number of 3. Of these three sources, 
only two have a spectral index steeper than —0.5, and hence 
should be included in the analysis. As these two have greatly 
differing spectral indices and morphologies, again they are 
unlikely to be limited to any one redshift range, and simi- 
larly to CoNFIGl, we do not include these two sources in 

the subsequent analysis. 

The observed spectral index is taken from lGendre et all 

calculated using a linear fit to flux data points be- 
tween 1.4 GHz and 151 MHz. We cross-match ed the CoN- 
FIG catalogues with the VLSS and 7C 151 MHz (jHales et all 
120071 ) catalogue, giving flux data at 74, 151, 365, 408, 1400, 
2700 and 5000 MHz for both samples to allow curvature 
analysis (see Section 6, and Ker et al, in prep). Frequency 
coverage for this dataset is very good: only 28 sources have 
no data at 74/151 MHz, and only 15 have no 2.7/5 GHz 
data. 

2.7 Parkes Selected Regions 

The original Parkes Selected Regions jWall et al.l 1 19681 : 
iDownes et"al]|l986l : Ibunlop et allll989h is a complete 178 
object sample containing all radio sources brighter than 0.1 
Jy over a 0.075 sr sky area at 2.7 GHz. We have reselected 
the sample at 1.4 GHz > 0.36 Jy as PSRr, to which flux 
limit there are 59 sources with an observed spectral index 
between 2.7 GHz and 1.4 GHz steeper than —0.5. 36 of these 
sources have spectroscopic redshifts, with the remaining 23 
having redshift estimates from the K-z relation or photo- 
metric spectral fitting. 

2.8 The CENSORS Sample 

The Combined EIS-NVSS Survey of Radio Sources, or CEN- 
SORS sample is a 135 source sample of all radio sources with 
an NVSS 1.4 GHz flux density greater than 7.2 mjy in a six 
square degree patch of t he sky centred on the ESO Imaging 
Survey (E1S) Patch D (|Best et alj 120031 1. The sample cur- 
rently stands at 96% identified and 78% spectroscopically 
complete, and is currently one of the largest highly spectro- 
scopic ally complete faint 1.4 GHz selected samples in exis- 
tence (|Brookes et alj|200d : iBrookes et ai1l200Sl ; iRigbv et al.l 
l201ll : Ker et al. in prep). 105 sources have spectroscopic red- 
shifts, and of the remaining 30, 25 have redshift estimates 
based on the K-z relation, and 5 have only a lower limit 
redshift estimate from a if-band non-detection. At _R"-band 
limits of ~19 and above, the non-detections are sufficiently 
deep that the sources must be at high redshift, and the lower 
redshift limit can be adopted as the estimated redshift with- 
out great loss of accuracy. The observed spectral index is 
measured between 1.4 GHz and 325 MHz (see Ker et al, in 
prep for 325 MHz data). 

2.9 The Hercules Sample 

The Hercules sample is taken from a field in the Leiden- 
Berkeley Deep Survey, and consists of 64 sources selected 
to have a flux density greater than 2 mjy at 1.4 GHz 
|Waddington et all [2001 V The spectroscopic completeness 



stands at 66%, with 20 sources having photometric redshifts 
based on the K-z relation, and the final two having a red- 
shift limit estimated from A"-band limits. Again, at 7f-band 
limits of 20.7 and 19.85 mag respectively, the non-detections 
are sufficiently deep that the sources must be at high red- 
shift, and the lower redshift limit can be adopted as the 
estimated redshift. Observed spectral indices are calculated 
between 1.4 GHz and 610 MHz. 



3 RADIO SOURCE PROPERTIES AND 
SAMPLE SELECTIONS 

Complete radio samples will select very different popula- 
tions, depending on the frequency at which they are selected, 
and their flux density limit, as different physical contribu- 
tions dominate at differing rest-frame frequencies. In this 
study, we utilise samples selected in both the MHz and GHz 
regimes, representative of existing spectral index studies. 

Figure [2] illustrates the contributing components of a 
typical extended radio galaxy. If the source is not highly 
beamed, i.e. not viewed along the jet axis, the emission is 
dominated at low frequency by synchrotron emission in the 
radio lobes. Radio lobes typically display a steep-spectrum 
power law slope, which can steepen further at higher fre- 
quency due to b oth synchrotron and inverse Compton losses 
(see for example ICarilli et al.l (|l99lT ) who analyse the radio 
spectrum of the well-studied local radio galaxy Cygnus A 
in depth). At low frequencies, the lobe spectrum can turn 
over due to synchrotron self-absorption. The frequency at 
which this happens depends on both the size and intensity 
of the emitting component: it occurs at higher frequencies 
for smaller emitters, leading to the smallest radio sources at 
sub-kpc size being GHz Peaked Sources (GPS). 

At higher frequencies (above a few GHz), the contri- 
bution to the spectrum from the core is often important. 
Emission from the core is typically flat spectrum, due to the 
superposition of self-absorbed components of different sizes 
at the base of the radio jet. If the jet is orientated towards 
us, it can be Doppler-boosted by beaming, and can become 
dominant at lower frequencies. 

As can be seen from Figure [2] if a sample is selected at 
a few hundred MHz, up to high redshifts the radio emission 
will still be probing the lobe-dominated regime, giving a 
sample of similar, directly comparable sources. However, if a 
sample is selected at GHz frequencies or above, sources with 
a significant core component that are orientated such that 
the jet is aligned along the line of sight towards us (beamed) 
will be preferentially included, especially at higher redshifts. 

It is thought that the observed z-a relation may arise 
because sources at higher redshift have lobes doing work 
against a denser medium. Working against a denser medium 
means there will be less adiabatic expansion losses, and 
therefore greater synchrotron losses, with the result that the 
source is brighter but the radio spectrum steepens faster. 
However, as shown, in GHz selected samples, the observed 
spectral index may alternatively be flattened at the highest 
redshifts by an increasing contribution of a core component, 
and be less affected by environment. 

Although only rest-frame spectral indices should have 
any direct physical correlation with other observables (ob- 
served spectral indices being a good approximation), as far 
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Figure 2. A sketch of the contributions of the various components of a typical double radio galaxy to the rest-frame radio regime, and 
the effects of synchrotron self-absorption, losses and beaming on the observed radio spectrum. Also shown are the observed frequencies 
at which these features would be observed at redshifts two and six for comparison. 



as possible, we utilise a traditional two point observed spec- 
tral index, so as to match the situation most widely en- 
countered in the literature and most simply provided by the 
observations, e.g. for the selection of USS sources. 

There is a strong argument to exclude as far as possible 
all significantly beamed sources identified in the samples, 
as not only will their spectral index estimates be distorted 
(the beamed component generally being flatter spectrum), 
they will also be heavily foreshortened in size. However, as 
our primary motivation is measuring the efficiency of radio- 
based correlations in selecting high redshift radio sources 
from radio surveys, we must adopt a simple approach to 
removing these that can be widely applied to blind radio 
surveys. In most comparible observational studies, a cut of 
a=— 0.5 is used as a division, to separate out flat and steep 
spectrum sources (and indeed such a cut has already been 
applied in the definition of some of the samples we use). 
Hence, in order to analyse comparable parts of the radio 
spectrum, we restrict analysis in this study only sources with 
an observed spectral index less than —0.5; these will largely 
be of a similar type (lobe dominated). Sources with a flatter 
spectral index represent a composite population: as well as 
quasars and core-dominated sources, they may also include 
for example young peaked radio sources (see analysis in Ker 
et al, in prep). We do not remove starburst galaxies, as their 
numbers are negligibly low in all samples. 

Luminosities were calculated for each sample using P„ 
= 47rS I ,(l+z)- 1 -° ! D|, where a is the observed spectral in- 
dex, defined as S v oc v a and Dl is the luminosity dis- 
tance. The transverse linear size in Mpc of each radio 
source was calculated using D — 9Da where 6 is the max- 
imum measured angular extent of the radio source on the 
sky in radians, and Da is the angular diameter distance 



(D A = D L /(1 + z) 2 ). For Hercules and CENSORS 6 is de- 
termined at 1.4 GHz; for TOOTS, 7CRS, 6CE and 3CRR 9 
is measured at 151 MHz. There are no readily available an- 
gular size measurements in the literature for WP85r, PSRr 
and CoNFIG 1 & 2r. 



4 OBSERVABLE TRENDS 

The complete samples detailed previously provide excellent 
coverage of the PzaD parameter space. In Figure(3] the logP, 
logD, a and log(l+z) planes are plotted, along with best 
fitting straight lines to the data. By eye, the data appear 
to display s imilar depende n cies o f spectral index to those 
reported by IBlundell et~ai1 l|l999l ) for the 3CRR, 6CE and 
7CRS combined complete radio samples, namely that ob- 
served spectral indices steepen with linear size, redshift and 
radio power (upper panels). Equations of the linear depen- 
dencies fitted for spectral index on luminosity, linear size 
and redshift are given in Tables 5 and 6. 

As can be clearly seen from the P-z panels in Fig- 
ure O the use of only one complete sample means that a 
strong, dominating, P-z correlation due to Malmquist bias 
is present, and this makes disentangling the various depen- 
dancies between P, z, a and D very difficult. The addition 
of several complete samples mitigates this co rrelation some- 
what, and indeed many previous studies, e.g. IBlundell et al.l 
( 1999) argue that the combination of several complete sam- 
ples essentially removes the Malmquist bias. 

With the excellent coverage of the P-z plane afforded by 
our nine complete samples, we are able to test if this is in- 
deed the case. We select a Malmquist-bias-free section of the 
P-z plane for both the high and low frequency selected sam- 
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Figure 3. The zoPD planes for the 151 MHz (upper 6 panels) and 1.4 GHz (lower 6 panels) selected samples. Spectral indices are 
measured between 1400-151 MHz for 6CE, 7CRS, TOOTS-00 and CoNFIGl&2r, 750-151 MHz for 3CRR, 5-2.7 GHz for WP85r, 2-1.4 
GHz for PSRr, 1400-325 MHz for CENSORS and between 1400 MHz and 610 MHz for Hercules. Only sources with a steeper than -0.5 
are utilised. The solid green lines indicate the best fitting straight line to the data. The blue dashed lines indicate the linear fit repeated 
for a Malmquist-bias-free section of the P-z plane, as defined in Figure [l] Note that only CENSORS and Hercules are included in the 
linear size figures for the 1.4 GHz samples (see text). 
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Table 2. Spearman Rank Correlation Coefficients and the associated 2-tailed p-value for various combinations of PDctz. A * denotes 
that only the CENSORS and Hercules samples were used in measuring the correlation, as these are the only two high frequency selected 
samples with size information readily available. 



Combination 151 MHz(All) 151 MHz(P-z) 1.4 GHz(All) 1.4 GHz(P-z) 



logP-log(l+z) 


r= 


0.72, p= 


=0.000 


r= 


0.02, p= 


=0.780 


r = 


0.61, p= 


=0.000 


r = 


-0.19, p= 


=0.030 


logP-logD 


r= 


-0.14, p= 


=0.010 


r= 


0.07, p= 


=0.350 


r*= 


0.01, p= 


=0.860 


r*= 


0.01, p= 


=0.920 


logP-a 


r= 


-0.36, p= 


=0.000 


r= 


-0.10, p= 


=0.180 


r =• 


-0.11, p= 


=0.010 


r = 


0.01, p= 


=0.920 


log(l+z)-o 


r= 


-0.34, p= 


=0.000 


r= 


-0.15, p= 


=0.050 


r — 


-0.19, p= 


=0.000 


r = 


-0.14, p= 


=0.100 


log(l+z)-logD 


r= 


-0.25, p= 


=0.000 


r= 


-0.15, p= 


=0.050 


r*= 


-0.10, p= 


=0.211 


r*= 


-0.53, p= 


=0.000 


logD-a 


r= 


-0.16, p= 


=0.001 


r= 


-0.26, p= 


=0.000 


r*= 


-0.18, p= 


=0.028 


r*= 


-0.25, p= 


=0.080 



pies, covering a large range in both redshift and radio power 
(see Figure [T] for the area definition), and repeat the linear 
fits (plotted as blue dashed lines in Figure [3j. This utilises 
186 sources in 151 MHz samples, and 133 sources in the 1.4 
GHz samples (reduced to 56 sources when investigating lin- 
ear size, as for the 1.4 GHz samples only CENSORS and 
Hercules have readily available size information). The D-q 
relation appears to increase in strength when the residual 
P-z correlation is removed, whilst the z-a and P-a decrease 
in strength. 

In Table [5] we present the Spearman rank correlation 
coefficients for the relations plotted in Figure [3] Also listed 
is the 2-tailed p-value, which gives an approximate indica- 
tion of the probability of an uncorrelated system having a 
Spearman correlation at least as strong as the one calcu- 
lated from the observed data. The table illustrates several 
important points. Firstly it shows that the P-z correlation 
dominates, even when several complete samples are coadded 
and analysed together, in other words, simply adding several 
complete samples does not provide sufficient coverage of the 
P-z plane to fully remove the dominant P-z correlation. Sec- 
ondly it reveals that the D-q, D-z and z-a correlations are 
the strongest observed in both high and low frequency se- 
lected samples. The z-a correlation is stronger than the P-a 
correlation (correlations between P, a and D more or less 
disappear once the P-z correlation is removed). Of particu- 
lar relevance to this study is the fact that the variation of 
observed a with size and redshift is relatively weak for both 
samples. 



4.1 Principle Component Analysis 

Analysing fully covered sections of the P-z plane has shown 
that relations between P, z, a and D are strongly coupled 
to the Malmquist bias. With this in mind, we utilise an- 
other statistical test, Principle Component Analysis (PCA), 
which is a technique designed to pick out the intrinsic, dom- 
inant linear correlations existing in a multi- variable dataset, 
as opposed to secondary correlations arising due to combi- 
nations of others (in this case, particularly the Malmquist 
bias). The method of PCA involves calculating the eigenval- 
ues and eigenvectors, composed of linear combinations of the 
normalised input parameters, which span the directions of 
maximum variance in the input dataset. These eigenvectors 
and eigenvalues describe the intrinsic correlations present in 
the dataset (principal components), along with the percent- 
age of the variance in the data that each explain. The results 
of PCA are most commonly presented in table form, list- 



ing each of the principal components, the percentage of the 
data variance that they explain, and the composition of each 
principle component. Each principle component is composed 
of a normalised combination of the entered variables, in this 
case a, log(l+z), logP and logD, as PC == xia +X2log(l+z) 
+ X3logP + X4logD, and the final four columns in the table 
present xi, X2, X3,and X4, showing the relative contributions 
of each variable for each principle component. 

We first look at a low frequency selected sample, com- 
posed of the 3CRR, 6CE, 7CRS and TOOTS samples, and 
perform a PCA analysis (see Tabled upper). The P-z corre- 
lation dominates (i.e. the first principle component is along 
an axis primarily composed of P and z), contributing roughly 
half of the observed variance. A further ~30% variance is 
contributed by a D-a anticorrelation, whereby sources of 
larger size have steeper spectra. The final two components 
largely just account for scatter around the two dominating 
independent relations between P-z and D-a. This is an im- 
portant finding, which is consistent with that demonstrated 
by the previous section, that the P-z correlation remains 
dominant even when a large collection of complete samples 
is used. 

Although PCA should successfully identify all underly- 
ing independent correlations in the data, we ran the analysis 
again on just the subsamples in a well-covered region of the 
P-z plane, thereby removing the selection effect (see Figure 
[TJ. A second motivation for doing this is to restrict analysis 
to only high-power radio sources, thus studying a relatively 
uniform population (extended double radio sources), with 
little contamination from lo w power sources which are often 
unresolved (see for example. iBaldi &: Capettill2009l ). In this 
case, the observed variance can be attributed to two inde- 
pendent relations, each giving an almost equal contribution 
to the variance (see Table [3l lower). The largest contributer, 
at 33%, is an anticorrelation between a and D as found for 
the whole sample, followed by 28% contribution between D 
and z. The third 24% contribution arises almost solely along 
the logP axis, uncorrelated with the other parameters. This 
confirms our earlier findings, that D-a and D-z relations 
are intrinsic to the dataset, irrespective of the presence of 
Malmquist bias. 

The results for a high frequency selected collection 
of samples (composed of CENSORS and Hercules) show 
broadly similar results, albeit with some difference in the 
detail (see Table [4]). For the entire sample, the results are 
very similar, again with approximately half the variance be- 
ing acounted for by the P-z correlation, and a further 30% 
by a D-a correlation. The main difference is that this latter 
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correlation is weakened somewhat by the third component. 
If we then select a well covered section of the P-z plane 
(see Figure [IJ, then similarly to the low frequency data, ap- 
proximately 40% of the variance is accounted for by a D-z 
anticorrelation, followed by a 32% q-D, P anticorrelation 
(see Table [4] lower). A weaker P-a correlation accounts for 
the large remainder of the variance, which removes the weak 
P-a anticorrelation contribution of the second component. 

From this analysis we can tentatively conclude that 
there are two firm independent relations present in both 
datasets, between D and a, and between P and z. By utilis- 
ing the full P-z coverage subsamples, we confirm that the D- 
o correlation is fully independent of Malmquist bias. In the 
subsamples restricted to be high power sources with good 
P-z coverage, a strong D-z anticorrelation is also seen. That 
the D-a correlation is slightly stronger in the low frequency 
dataset, and the D-z anticorrelation stronger in the high fre- 
quency sample is most likely down to the differing types of 
sources which low and high frequency selected samples col- 
lect. Low frequency samples will primarily be composed of 
lobe dominated sources, suffering little in the way of orienta- 
tion bias, and hence a large proportion of large, steep spec- 
trum sources. High frequency selected samples will include 
many more beamed, core dominated and young GPS/CSS 
sources, and less classical lobed dominated sources. 

The correlation between D and a arises due to aging 
of the radio sources. As a source gets older, it increases 
in size and the spectrum steepens with age. The physical 
cause of the anticorrelation between D and z is subject to 
more debate. It could arise as the result of the environ- 
ment at high redshift, or as a result of sources at high red- 
shi ft being more likely to be younger, and hence smaller 
fcf. iBlundell fc Rawlings 1999). It is interesting to note that 
despite a Spearman rank test (see Table [2]) suggesting the 
presence of a correlation between z and a almost as strong 
as that between D and z, the Principle Component Analy- 
sis does not clearly identify an independent z-a relation in 
either the high or low frequency selected samples, suggest- 
ing that the correlation observed may be largely a result of 
selection effects. One which may be present is that between 
radio power and linear size. It is thought that radio sources 
follow tracks on the P-D plane, beginning with high power, 
small sources, and evol ving into lower powe r, larger sources 
in time (see for example lKaiser fc Bestll2007| y Individual low 
frequency samples, which are more sensitive to extended ra- 
dio lobes, show a trend for radio power to increase as linear 
size decreases, which could arise from the combination of 
Malmquist bias and the D-z correlation (and indeed, this 
correlation weakens substantially once Malmquist bias is re- 
moved). In a collection of low frequency selected samples, 
this trend in conjunction with any remaining Malmquist bias 
and the D-a correlation would naturally lead to an extrinsic 
contribution to the z-a correlation. 



5 A LARGE INTRINSIC SCATTER IN a 

Given the independent trends between spectral index, linear 
size and redshift, identified both visually and by the PCA 
analysis, an attempt was made to fit an analytical form to 
the spectral index using linear size, luminosity and redshift. 



PC 


% 


C a bs 


log(l+z) 


l°gPl51Af-ff2 


logD(Mpc) 


1 


49 


0.38 


-0.65 


-0.64 


0.19 
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29 


-0.60 


-0.10 


-0.02 


0.79 


3 


15 


0.70 


0.20 


0.38 


0.57 


1 


7.0 


0.06 


0.72 


-0.67 


0.12 




PC 
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Cobs 


log(l+z) 


l°gPl51Afffz 


logD(Mpc) 


1 


33 


0.71 


-0.15 


-0.30 


-0.62 


2 


28 


-0.20 


0.85 


0.10 


-0.48 


3 


24 


0.25 


-0.13 


0.95 


-0.15 


4 


15 


-0.63 


-0.48 


0.003 


-0.61 



Table 3. Upper Table: Principle Component Analysis for the 
low frequency selected samples, comprising 375 sources with a < 
—0.5. Lower Table: The same analysis repeated for a well covered 
section of the P-z plane: logP(W H z - 1 )=27.75-29.5 z=0.5-3.5, 
using 186 sources. 
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log(l+z) 


logPxAGHz 


logD(Mpc) 
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47 


-0.12 


0.70 


0.70 


0.01 
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29 


0.66 


0.12 


0.02 
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-0.09 
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0.69 
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log(l+z) 


logPi. 4G Hz 


logD(Mpc) 


1 


42 


0.005 


0.71 


0.48 


-0.52 


2 


32 


0.71 


-0.08 


-0.46 


-0.52 


3 


19 


-0.67 


0.15 


-0.62 


-0.39 


4 


7.0 


-0.21 


-0.69 


0.42 


-0.56 



Table 4. Upper Table: Principle Components Analysis on the 
GHz frequency selected samples of CENSORS and Hercules, com- 
prising 158 sources with a < —0.5. Lower Table: Repeated for a 
well covered selection of the P-z plane: logP(W Hz -1 )=26. 25-29 
z=l-4.5, using 56 sources. 



Again, as detailed in Section 3, we use only sources with an 
observed spectral index steeper than —0.5. 

Table [5] and Table [6] list the best fitting coefficients for 
each relation modeled. We began with very simple linear fits, 
and progressed to fitting planes modeling all four variables. 
We can see clearly that both the reduced \ 2 and the residual 
standard deviation decrease, albeit by a small amount, with 
the inclusion of additional variables in the model for both 
the high and low frequency selected samples. A plane fit of 
all four variables gives the best fit, and the smallest devia- 
tion in a residuals for both low and high frequency selected 
samples. The best fitting model is illustrated in Figure [4] Al- 
though the plane model manages to successfully remove the 
trends between spectral index and linear size, radio power 
and redshift, the key finding is that it is unable to predict the 
observed a. The intrinsic scatter in a is much greater than 
that arising from any physical trends with other observables 
present in the datasets. 

Whilst this was a simplistic approach designed to see if 
it was possible to predict the observed spectral index with 
any success from other properties, it should be noted that 
much more complex models, incorporating the physics of 
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Table 5. The results of fitting functions of a dependent on z, P and D for the low frequency selected samples. We assume a measurement 
error of 0.1 in a for all fits, for the determination of the reduced \ 2 an d u - 



Model 


Sample 


rx 2 


(7 


ai 


a 2 


a3 


a 4 


a=-0.S 


whole 
Pz 


2.47 
2.53 


0.15 
0.14 










oj=ailog(l+z)+a 2 


whole 


1 OS 

l.yo 


n 1 t: 

u.io 


-U.oU^U.Uo ) 


-U. / o^U.Ul ) 








Pz 


1.01 


0.14 


-0.17(0.06) 


-0.80(0.02) 






a=ailogP+a 2 


whole 


1.99 


0.15 


-0.04(0.01) 


0.40(0.14) 








Pz 


1.02 


0.14 


-0.02(0.02) 


-0.41(0.43) 






Q=ailogD+a 2 


whole 


2.12 


0.15 


-0.05(0.01) 


-0.88(0.01) 








Pz 


0.92 


0.14 


-0.07(0.01) 


-0.94(0.01) 






a=ai log( l+z)+a 2 logP+a 3 


whole 


1.95 


0.15 


-0.17(0.05) 


-0.03(0.01) 


-0.09(0.18) 






Pz 


1.00 


0.14 


-0.17(0.06) 


-0.01(0.02) 


-0.59(0.44) 




a=ailogP+a 2 logD+a 3 


whole 


1.87 


0.14 


-0.05(0.01) 


-0.06(0.01) 


0.49(0.14) 






Pz 


0.92 


0.14 


-0.02(0.02) 


-0.07(0.01) 


-0.44(0.44) 




a=ai log( l+z)+a 2 logD+a 3 


whole 


1.83 


0.14 


-0.36(0.03) 


-0.07(0.01) 


-0.80(0.01) 






Pz 


0.88 


0.13 


-0.25(0.06) 


-0.08(0.01) 


-0.86(0.02) 




Q=ailog(l+z)+a 2 logP+a3logD+a 4 


whole 


1.80 


0.14 


-0.25(0.05) 


-0.02(0.01) 


-0.07(0.01) 


-0.18(0.19) 




Pz 


0.88 


0.13 


-0.25(0.06) 


-0.01(0.02) 


-0.08(0.01) 


-0.70(0.44) 



Table 6. The results of fitting functions of a dependent on z, P and D for the high frequency selected samples. We assume a measurement 
error of 0.1 in a for all fits, for the determination of the reduced x 2 an d a - Models marked with a * use only CENSORS and Hercules, 
the only two high frequency selected samples for which there is size data readily available. 



Model 


Sample 


o 

rx 


a 


ai 


a 2 


a 3 


a 4 


o=-0.8 


whole 
Pz 


4.10 
3.70 


0.20 
0.19 










a=ailog(l+z)+a 2 


whole 


2.75 


0.20 


-0.21(0.03) 


-0.80(0.01) 








Pz 


1.56 


0.18 


-0.13(0.04) 


-0.81(0.01) 






a=ailogP+a 2 


whole 


2.81 


0.20 


-0.01(0.003) 


-0.58(0.09) 








Pz 


1.57 


0.19 


-0.01(0.01) 


-0.57(0.20) 






a*=ailogD+a 2 


whole 


3.20 


0.21 


-0.054(0.01) 


-0.95(0.02) 








Pz 


0.81 


0.18 


-0.09(0.02) 


-1.01(0.04) 






a=ailog(l+z)+a 2 logP+a3 


whole 


2.74 


0.20 


-0.24(0.03) 


0.006(0.003) 


-0.95(0.10) 






Pz 


1.56 


0.18 


-0.13(0.04) 


-0.0001(0.01) 


-0.81(0.22) 




a*=ai logP+a 2 logD+a 3 


whole 


3.20 


0.21 


-0.004(0.01) 


-0.05(0.01) 


-0.85(0.20) 






Pz 


0.79 


0.18 


-0.06(0.03) 


-0.09(0.03) 


0.58(0.80) 




a* = ai log ( 1 +z ) + a 2 logD + a 3 


whole 


3.14 


0.21 


-0.18(0.05) 


-0.06(0.01) 


-0.90(0.02) 






Pz 


0.75 


0.17 


-0.41(0.11) 


-0.12(0.01) 


-0.90(0.02) 




a*=ai log(l+z)+a 2 logP-)-a 3 logD-|-a4 


whole 


3.02 


0.20 


-0.63(0.1) 


0.08(0.02) 


-0.09(0.01) 


-2.90(0.38) 




Pz 


0.74 


0.17 


-0.42(0.13) 


0.01(0.04) 


-0.12(0.03) 


-0.99(0.94) 



radio sources can reproduce the observed luminosity, linear 
size and redshift distributions with some success, but str ug- 
gle to reproduce a (see for example iBarai Sz Wiita|[2007l '). 



It is very clear that the correlations between a and size, 
luminosity and redshift are weak. The results of this suggest 
that the use of spectral index alone is unlikely to be efficient 
in selecting high redshift radio sources. The equally strong 
D-z correlation indicates that inclusion of radio size infor- 
mation may increase the efficiency of selection based solely 
on radio observables. 



6 THE ORIGIN OF THE a-Z CORRELATION 
IN FLUX LIMITED SAMPLES 

The tendancy for observed spectral indices to steepen with 
redshift has been attributed to a k-correction, where as the 
source spectrum is redshifted, a steeper part of the spectrum 
is sampled. How much of an effect this is has been a source of 
much debate in the literature. It has also been suggested that 
the strength of the z-q correlation increases with frequency, 
as high frequency parts of the radio spectrum undergo more 
significant synchrotron losses. 

iKlamer et alJ (|2006t ) find the majority of their USS sam- 
ple display no curvature, and also cite the well studied high 
redshift source 4C41.17, at z=3.8, as having a straight ra- 
dio spectrum from 26 MHz to GHz frequencies. They there- 
fore infer that the k-correction is irrelevant for high-z USS 
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Figure 4. Fitting the function «=ailog(l+z) + a2logP + a3logD + a4 to the CENSORS and Hercules combined samples (upper five 
panels), and to the 3CRR, 6CE, 7CRS and TOOTS combined samples (lower five panels). The panels on the extreme left show the 
distributions of the spectral index a residuals (observed a minus the model predicted a), and the next four panels on each line show 
how the a residual depends on z, a, P, and D. Small points are the raw data, large points are the binned means. Fitting the functions 
clearly removes trends in z,D, and P but large scatter remains, as indicated in the a plot. 



sources. However, based on this, we cannot simply conclude 
that the contribution to any z-a correlation from the k- 
correction is negligible. In fact, Figure [S] shows the radio 
spectra of all currently known z>4 radio galaxies - the ma- 
jority of which show some evidence of curvature in the ob- 
served radio spectrum. Most of the currently known radio 
galaxies at z>4, display a compact steep radio spectrum, 
with curvature occurring at low observed frequencies (~100 
MH z), data which Klamer et al did not have for their sam- 
ple. iBornancini et all (|2007t ) also confirm the presence of 
curvature at low MHz frequencies for their USS sample. 

To quantify the potential effect of the k-corrections, we 
used two samples, CoNFIG and 7CRS. These two samples 
have the best multi-frequency coverage, and hence most 
accurately determined radio spectra. Rest frame spectral 
indices are calculated from fitting a 2nd-order polynomial 
(logS I ,=ai+a2logi'-r-a3log 2 ^) to the radio spectrum for each 
source, and measuring the gradient (a=a2+2a3logi/) at the 
desired frequency; details of this will be presented in Ker et 
al. (in prep). A 2nd order polynomial fit provides a good fit 
to the radio spectra of the vast majority of sources in each 
sample. 

We then performed a simple linear fit to the observed 
and rest-frame spectral index measured at three frequencies 
as a function of log(l+z) for both CoNFIG and 7CRS (see 
Figure [6]). We performed the fit only on sources with an 
observed spectral index between 1.4 GHz and 151 MHz less 
than —0.5 and with a well determined radio spectrum. The 
gradients of these fits then reflect the strength of the z- 
a correlation present (if any). The results we obtain are 
striking. For 7CRS we confirm that the gradient of both the 
observed and the rest-frame z-a correlation increases with 
the frequency a t whi ch a is measured, as first reported by 
iBlundell et all (|l999l ). We also see that the measured z-a 
correlation is approximately twice as steep in the observed- 



frame than in the rest-frame (dependent on frequency). It 
is also worthwhile noting that for 7CRS, contrary to the 
z-a correlation, the D-a correlation strengthens in the rest- 
frame. 

Similarly, for CoNFIG we see that the observed-frame 
correlation can be 50% steeper, or more at all frequencies 
than that measured in the rest-frame. However the increase 
in gradient with frequency is not seen. We suggest that this 
is because GHz selected samples pick out very different pro- 
portions of various types of radio source, favouring young 
GPS/CSS, core and beamed sources (much higher orienta- 
tion bias). 

To test this, we ran the fits again, this time exclud- 
ing all known quasars, and objects classified as compact in 
CoNFIG, and sources with a size less than 30kpc in 7CRS. 
This ensures that the vast majority of sources included in 
both samples will be lobe dominated and working against 
the IGM, and are not heavily contaminated by beamed 
sources or are sources so small that they are still propa- 
gating through the medium of their host galaxy rather than 
the IGM. The results are plotted in the bottom panel of 
Figure [6] and the difference is clear to see. Both CoNFIG 
and 7CRS now follow very similar relations, both displaying 
observed gradients which are approximately twice as strong 
as the rest-frame gradients, but which are now largely in- 
dependent of the frequency at which a is measured. It is 
interesting to note that the strength of the gradient for ob- 
served a for both samples is very similar to that determined 
by Ubachukwu et al. (1995) for a sample of radio galaxies 
compiled from the 3CRR and WP85 samples, again exclud- 
ing compact sources. 

Our results confirm that once the k-correction is re- 
moved, a weak correlation between z and a remains for ex- 
tended radio galaxies, which would fit in with a scenario 
where lobes are working against a denser environment at 
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Figure 5. Radio spectra for the nine highest redshift z>4 radio 
galaxies known. Fluxes were obtained from the NASA Extragalac- 
tic Database (NED), at frequencies ranging from 38 MHz to 8 
GHz. The flux scale is offset by a small arbitrary amount for each 
source to allow the shapes of the radio spectra to be compared. 
TN J0924 (z=5.19), J1639 (z=4.88) and CEN69 (z=4.11) have 
spectr al indices w hich imply that they should be detected in the 
VLSS JCohen et alj|2007h assuming a straight spectrum, however 
all three arc not. We measure the noise (<r) at each source position 
in the VLSS maps, and take the 2a value as the upper limit 74 
MHz flux, and assume a one a error. Note that interestingly, the 
two highest redshift known sources are easily detectable at 1.4 
GHz in the NVSS, at 71.5 mjy and 21.8 mjy respectively, but 
would not be detected in any currently existing 150 MHz or 74 
MHz surveys. The vast majority of the spectra are classified as 
compact steep spectrum, most flattening towards lower frequen- 
cies (four of these are potentially peaked). Only one (7C1814, at 
the lowest redshift) is confirmed as straight over the frequency 
range 74 MHz to 5 GHz. This negates the common assumption 
that high redshift, USS sources display no curvature over a large 
frequency range. The sources are ordered in Pi AGHz (calculated 
with full curvature information), and it can be clearly seen that 
the most powerful sources are more likely to display significant 
curvature. It is also worth noting that all but one of these sources 
(7C1814) are compact, and have 6»<6". 



higher redshift, and hence high frequency losses are greater. 
Miley fc De Breuckl |2008l ) note, however, that it is very dif- 
ficult to reproduce the observed z-a relation from this some- 
what simplistic density-dependent effect. They suggest that 
as the density of gas around high redshift sources has been 
observed to be highly inhomogeneous, and denser close to 
the nucleus of the galaxy, that the ultra-steep radio spectra 
are produced by some as yet unknown mechanism within 
the host galaxy, rather than by the IGM conditions through 
which the radio lobes are propagating. 

We also conclude that GHz selected samples have a 
much greater orientation bias present, which can disguise the 
presence of the z-a correlation displayed by extended radio 
galaxies. We have also successfully demonstrated that the 
k-correction is not negligible when measuring the strength 



of any z-a correlation, and can be responsible for more than 
50% of the strength of the observed gradient in a flux-limited 
sample. 



7 IMPLICATIONS FOR HIGH REDSHIFT 
SEARCHES 

The data collected for the nine complete samples allows us 
to measure for the first time, the efficiency of the three most 
commonly used methods in the literature for searching for 
high redshift radio galaxies, namely radio spectral index, an- 
gular size and A"-band magnitude. We are looking for a set of 
criteria which minimises the size of the selected subsamples 
that would require follow up observations, whilst maximis- 
ing the number of high-z galaxies retained in this sample. 
We assume a definition of highest efficiency as maximising 
the difference between the total fraction of the sample re- 
covered, and the total fraction of high-z sources recovered, 
with each increasing cut in the selection parameter under 
study. We choose to consider 'z>2' radio galaxies as high-z 
sources, as for the datasets under consideration this provides 
the optimal compromise between maximising the redshift 
whilst still maintaining sufficient high-z sources to allow a 
robust analysis. For comparison, we also show the analy- 
sis repeated for z>3 where possible, albeit with much lower 
number statistics (we have 10 z>3 radio galaxies with spec- 
troscopic redshifts - see Table [7] - and 6 with photometric 
redshifts in our samples). As there are only approximately 
50 ra dio galaxies with z>3 known |lshwara- Chandra et al.l 
2010), our samples are hence representative of the highest 
known redshift radio galaxy parameter space). 



7.1 Spectral Index Selection 

As discussed above, an initial steep radio spectral index cut 
is an extremely popular method of reducing very large radio 
samples down to managable sizes for imaging and spectro- 
scopic follow-up, in order to locate high redshift sources. 
We now investigate whether a first spectral index cut does 
indeed recover a significant proportion of high-z sources 
present in the samples. Many recent studies in the litera- 
ture base searches for high-z radio galaxies on the assump- 
tion that they may be distin guished by a steep spectrum. 
Ilshwara-Chandra et al.1 (|2010h provide a list of the highest 
redshift, z>3, known radio galaxies, 47 in number, the vast 
majority of which have bee n selected from an USS sample. 
However I Jarvis et al.l l|2009l ). also recently reported the dis- 
covery of the second highest redshift radio galaxy known, a 
source which they noted clearly does not have a ultra steep 
spectr al index (see Figure O. Work with the DRaGONS 
study [jSchmidt et al.ll2006l ). which uses a large, bright radio 
sample from the 1.4 GHz FIRST survey with redshifts esti- 
mated from the K-z relation, also suggests that even with a 
relatively flat spectral index selection criterion of a < —0.8, 
one third of the z>2 sources are missed. In Table [Jj we 
present a list of the 10 radio galaxies with a confirmed spec- 
troscopic redshift of z>3 from all of the samples used in this 
study. Five of these do have a steep a < —1.0, however the 
remainder display a wide variety of spectral indices. 

In studies utilising an ultra steep selection criterion, an 
often used argument to justify the use of steep spectral index 



New Insights on the z-a Correlation from Complete Radio Samples 13 



-0.8 



-0.6 



a 
o 

4-1 

T3 

E -0.2 

ID 



0.0 



0.2 



~i r 




CoNFIG Obs 
G-O CoNFIG Rest 

yHk" 7CRS 0bs 

7CRS Rest 



2 3 4 

Frequency/GHz 



-0.8 



-0.6 



a 

n -0.4 
o 

4-* 

0) 

2 -0.2 



0.0 



0.2 




CoNFIG Obs 
G-O CoNFIG Rest 
•fa-fa 7CRS Obs 
7CRS Rest 



2 3 4 

Frequency/GHz 



Figure 6. The upper panel shows the gradient 'a' from fitting 
a=a*log(l+z)+b to rest-frame and observed spectral indices for 
CoNFIG and 7CRS. Whilst both samples show a clear decrease in 
the gradient of the z-a correlation when k-corrections are applied, 
only 7CRS shows a marked increase in strength with the fre- 
quency at which a is measured. The lower figure shows the same 
fit again for CoNFIG and 7CRS, but this time with all known 
quasars and compact objects (classifed as compact in CoNFIG, 
or less than 30kpc in size in 7CRS) removed. The gradients be- 
come very similar at all frequencies and for both samples when 
only extended sources are considered. These figures clearly show 
that the k-correction can be responsible for up to 50% of the z-a 
gradient observed in flux limited radio samples. 



cut-offs is the apparent strong shift in redshift distribution to 
high redshift. However , in the majority of these studies (e.g. 
IDe Breuck et al.ll200l iBrvant et al] 120091 ). the samples are 
very large (numbering in the hundreds), and spectroscopic 
follow up is expensive, and so often faint K or I/R band 
detections or limits are used as additional selection criteria 
when deciding which targets to pursue with spectroscopy. 
This makes it very difficult to disentangle the extent to 
which the ultra steep spectrum, or the optical/near- infrared 
selection criterion are responsible for preferentially selecting 
high redshift sources. 

Armed with spectroscopically complete samples at a va- 
riety of flux density limits and finding frequencies, we can 
determine the efficiency of the USS selection technique in 
an unbiased way, for an observed a ^-1.2 (we have too few 
sources steeper than this to study robustly). In Figure [7] 
we take each sample in its entirety, blindly apply a decreas- 
ing spectral index limit, and calculate the median redshift 
of the resulting sample of sources steeper than that limit. 
Considering first the results with no cuts applied, Figure [7] 
offers a clear observational confirmation that the redshift 
distributions of complete samples are dependent on the cor- 
responding flux density limits of each sample and selection 
frequency. It is immediately apparent that low-frequency se- 
lected samples, even at relatively bright flux density limits, 
select on average higher redshift sources. For both low and 
high frequency selected samples applying a cut of -0.9 gen- 
erally increases the median redshift of the obtained sample. 
That the median redshift decreases at very steep spectral 
index cuts is most likely due to the inclusion of very steep 
spectrum, low redshift clusters (e.g. IDe Breuck et al.ll2000h 
The samples selected at increasingly faint flux density lim- 
its for both high and low frequency selected samples also 
display higher median redshifts, except for the faint Her- 
cules sample at high frequency. As described in [Best et all 
(2003), the CENSORS fl ux limit was chose n beca use, ac- 
cording to the models of iDunlop fc Peacock! (|l990h . a sur- 
vey with a flux density limit of approximately 10 mjy at 1.4 
GHz is optimal for detecting sources at redshifts greater 
than 2.5, with the percentage of high-z sources detected 
decreasing at lower and higher flux densities. Our results 
for the GHz selected s a mples are consistent with this. Most 
recently, lAfonso et all l|201ll ) have extended the search for 
ultra-steep spectrum, high redshift sources to the sub-mjy 
level, and their sample appears to be broadly consistent in 
expected content with the much brighter samples studied 
here. Also illustrated clearly by Figure is the fact that 
the median values of redshift obtained from existing USS 
selected samples are very similar to those we obtain for USS 
selectio n of the complete samp les used in this study (except- 
ing the IDe Breuck et all 2000 USS sample, which includes 
a very strong additional selection criteria of targetting only 
those sources with the faintest 7f-band magnitudes). 

In Figure [5] we consider the efficiency and complete- 
ness of an ultra-steep spectrum criterion in selecting high-z 
radio galaxies. The figure shows the proportion of high-z 
sources recovered in each sample as steeper spectral index 
cuts are applied, along with the proportion of the entire 
sample that is returned, and the overall high-z content of 
the reduced subsample. This analysis is carried out using 
7CRS and TOOT-00 samples at low frequency, and CEN- 
SORS and Hercules at high frequency. The brighter sam- 
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Figure 7. The median redshift of all sources in a given sample which have spectral index steeper than a given spectral index, as a function 
of the spectral index limit. Data points are only plotted if the remaining sample size is at least five sources. Th e left panel displays the 
1.4 GHz-selected samples, and the right panel shows the 150 MHz-selected samples. H ighlighted in stars are the lDe Breuck et al. | (|2000|) 
1.4 G Hz USS s ample median redshift and t he median redshifts of the MR CR-SUMSS teryant et al.ll2009T) , NVSS-SUMSS jKIamer et al.1 
l2006h . 4C USS l|Chambers et al.lll996h 6C* l|jarvis et alfeOOlr i and 6C** {Cruz et al.ll2007f ) USS samples. These samples have additional 
biases due to if-band selection and incomplete spectroscopic redshifts, and hence a direct comparison is not possible, but it is interesting 
to note that their median redshifts are broadly consistent with what we see in spectral index cut, spectroscopically complete samples. 



Table 7. Observable Parameters for all spectroscopically confirmed radio galaxies at redshift z >3 in all of the samples studied. A CSS 
radio spectrum indicates that the source is compact, steep and peaks at low frequencies. A C- spectrum displays negative curvature, but 
no peak within the observed frequency range. 



Name 


Sample 


z 


A' 


a 


D/kpc 


Radio Spectrum 


7C1745+6624 


7CRS 


3.01 


20.25 


-0.78 


3.85 


CSS 


TOOTO-1214 


TOOTS 


3.081 


18.6 


-1.13 


115 


C- 


CEN 16 


CENSORS 


3.126 


19.32 


-0.86 


99.6 


C- 


7C 1748+6703 


7CRS 


3.2 


18.27 


-0.97 


106 


C- 


6C 1232+3942 


6CE 


3.22 


17.82 


-1.14 


228 


C- 


CEN 105 


CENSORS 


3.38 


20.16 


-1.16 


50 


straight 


6C 0902+3419 


6CE 


3.4 


19.70 


-0.84 


91.3 


straight 


CEN 24 


CENSORS 


3.43 


19.30 


-0.66 


10 


CSS 


7C1814+6702 


7CRS 


4.05 


19.16 


-1.01 


121 


straight 


CEN 69 


CENSORS 


4.11 


19.60 


-1.08 


9.7 


C- 



pies are not included as they have very few z>2 sources, 
and the combination of CENSORS and Hercules, 7CRS and 
TOOTS-00 provides two large samples of approximately 150 
sources each. The results of this are very interesting: for the 
low frequency selected sample, the baseline 15% fraction of 
high-z sources in the recovered subsample nearly doubles to 
30% with a spectral index cut a=-l, but at a cost of remov- 
ing 60-70% of the known high-z sources from the recovered 
subsample. For the high frequency selected samples there is 
hardly any difference, regardless of the spectral index cut ap- 
plied. In other words for the high-frequency selected sample, 
by excluding sources flatter than the cut, we are not gaining 
a substantial proportion of high z sources above that which 



we would expect if there was no correlation, and the data 
were distributed evenly across the alpha-z plane. 

In utilising complete samples to address the question 
of high-z selection efficiency using USS samples, our main 
limitation is the low number of extreme spectral index 
sources included in our collection of complete samples. Any 
radio sa mple will include ^5% a < —1, and ~1% a < 
-1.3 fe.g. lDe Breuck et aljlioool ). and indeed these propor- 
tions hold for the samples presented here: we have too few 
sources to study the q< — 1.2 range. What is needed is com- 
plete spectroscopic follow up of USS samples encompassing 
these extreme spectra sources, in order to determine the 
high-z fraction. However very few USS samples available in 
the literature have substantial spectroscopic completeness. 
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Figure 8. The bottom panel shows the fraction of all sources 
(solid lines) and the fraction of z>2 (dashed) and z>3 (dotted) 
radio galaxies that have steeper spectral indices than the given 
limit, as a function of that limit, for both the high (black) and 
low (red) frequency selected samples. The top panel displays the 
fraction of high-z radio galaxies in the sample recovered by these 
cuts. Note that the z>3 lines are much more uncertain, due to 
the relatively low numbers of these (see text). 



Figure 9. The bottom panel shows the fraction of all sources 
(solid lines) and the fraction of z>2 (dashed) and z>3 (dotted) 
radio galaxies that have smaller angular sizes than the given limit, 
as a function of that limit, for both the high (black) and low (red) 
frequency selected samples. The top panel displays the fraction of 
high-z radio galaxies in the sample recovered by these cuts. Note 
that the z>3 lines arc much more uncertain, due to the relatively 
low numbers of these. 



Mos t have additional optical or angular size biase s applied 
(e.g. iRottgering et al.lll996l . |Pe Breuck et al.ll200ut) when se - 
lecting candidates for spectroscopy. Chambers et al.l (|l996t ) 
present one of the most complete USS samples available. 
They study a small sample of 4C USS sources (a < —1.0), 
selected from lTielens et all (|l979l ) which is 50% spectroscop- 
ically complete, with 15 having R ox I band magnitudes, 
one with an / band limit, and one with no magnitude data. 
There are eight sources with spectroscopic redshift z>2. 
From the magnitude distribution, it is likely that those with- 
out spectroscopic redshifts are in the range 1.0<z<1.6. This 
gives the fraction of the USS sample with z>2 = 24% which 
is very much in line with our findings for low frequency sam- 
ples with this spectral index limit (see Figures 7 and 8). 

We can conclude from this that a USS selection criterion 
does work at low frequency, but is not a strong effect, whilst 
it is inefficient for high frequency selected samples. 



7.2 Angular Size Selection 

Another often used criteria for maximising the high-z con- 
tent of radio source samples is that of angular size. In Figure 
[9] we plot a similar diagram to that of the spectral index 
cuts. In this, it is clear that moderate cuts can be made to 
the sample based on angular size, whilst still ensuring the 
large majority of high redshift sources remain. 

The fraction of high-z sources in the recovered sample 
is similar for both samples, remaining constant at ~15% 
for the majority of angular size cuts, and increasing up to 
~25% for angular size cuts less than 10 arcsec. Contrary to 
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Figure 10. The bottom panel shows the fraction of all radio 
galaxies (solid lines), and the fraction of z>2 (dashed), z>3 (dot- 
ted) radio galaxies that have fainter if -band magnitudes than the 
given limit (bins of one magnitude), as a function of that limit, 
for both CENSORS (black) and 7CRS (red) samples. The top 
panel displays the fraction of z>2, z>3 radio galaxies in the two 
samples recovered by these cuts. Note that the z>3 lines are much 
more uncertain, due to the relatively low numbers of these. 
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a spectral index cut, angular size cuts prove to be gener- 
ally more effective for high frequency selected samples. For 
example, applying a cut of 5 arcsec retains 70% of known 
high-z sources in the sample, whilst reducing the total sam- 
ple to 40% of its original size. However, a similar cut for the 
low frequency sample retains less than 40% of the known 
high-z sources in the sample. Despite the high-z fraction 
in the remaining subsample having nearly the same depen- 
dence on the 9 cut for both high and low frequency selected 
samples, at low frequency a much smaller proportion of the 
total high-z galaxies is recovered. 

We conclude that angular size cuts can successfully re- 
tain the majority of high-z sources, whilst almost halving 
the original sample size for high-frequency selected samples. 
A larger angular size cut must be applied to low frequency 
samples in order to retain the same efficiency as seen for high 
frequency samples with a smaller cut applied. However, once 
again this is not a particularly efficient technique. 



7.3 if -band Selection 

Selecting high redshift galaxies from near infrared imaging 
is possible, thanks to the very tight relati on observed be- 
tween if-band magnitude s and redshift (e.g. lLillv fe Longair] 
1 1981 IWillott et aUl2003l ). It is worth noting too here that 
developments in recent years have identified a new popu- 
lation of radio sources without optical or infrared detec- 
tions, Infra- Red faint Radio Sources (IFRS). These are po- 
tential ly excellent very h i gh re d shift candidates; see for ex - 
ample iMiddelberg et al.l i|201ll ). iGarn fc Alexander! l|2008h . 
The second highest redshift known radio galaxy identified by 
|jarvis et al.ll200lil ) is an IFRS, and was selected for follow- 
up based purely on its lack of optical or if -band detection 
(it has a spectral index which is not USS, a — —0.75). 

The main drawback with this method however, is that 
very deep if -band imaging is required over the radio sur- 
vey area. To illustrate the efficiency of if -band imaging in 
selecting high redshift radio sources, we carry out a similar 
analysis to that performed for spectral index and angular 
size. We utilise only CENSORS and 7CRS for this analysis, 
as both samples are highly spectroscopically complete, and 
have readily available if -band data. For both of these sam- 
ples, in addition to the —0.5 spectral index cut as detailed in 
Section 4, we also exclude all known radio quasars in both 
samples, as these do not follow the if-z relation of radio 
galaxies. The high spectroscopic completeness is necessary, 
as we only want to use sources with spectroscopic redshifts, 
and not those with redshifts estimated from the if-z rela- 
tion. The ap erture corrected if-b and data for CENSOR S 
is taken from lBrookes et al.1 ^OOd ). and lRigbv et all (|201lf ). 
and for 7CRS, aperture corrected if-band data was obtained 
from publically available online catalogue^- It should be 
noted that 7CRS does not have complete if-band data for 
the sample, with 26 of the 92 radio galaxies having no if 
magnitude. However, all but one of these sources without 
a if measurement are at redshift one or below, and given 
the very tight if-z relation, all of these are expected to be 
bright, if <17 sources, and should not significantly affect the 
analysis of high-z sources in this sample. 



https : //www . astrosci . ca/users/willottc/kz/kz . html 



In Figure [10] we plot the fraction of high-z sources re- 
covered with an increasing if-band magnitude cut for CEN- 
SORS and 7CRS. It is immediately clear that a cut of 18.5 
in if-band magnitude recovers almost all high-z sources for 
both samples, with very few low redshift sources included. In 
previous years, applying this technique required dedicated 
deep if-band surveys, expensive in telescope time: cross- 
matching with existing wide area if-band surveys such as 
2MASS would potentially reduce the sample size by 10- 
20%, but this is limited by th e bright if-band mag nitude 
limits. The release of UKIDSS |Lawrence et al.|[2007l ) Large 
Area Survey data mitigates this somewhat, as the if -limit 
reaches 18 th magnitude (Vega), and covers many thousands 
of square degrees in sky area. If we apply an 18 t/l magnitude 
limit to our samples, then all the high-z sources are recov- 
ered, whilst the sample is reduced to ~30% of its original 
size. This is a far more successful selection method than any 
based on radio properties alone, and is now feasible over 
large sky areas. , and high resolution, wide and deep ra- 
dio surveys (limited to GHz frequencies) to enable matching 
of host galaxy to radio source. Note that in order for this 
technique to be successful, the radio data need to be of suf- 
ficiently high angular resolution to allow robust matching of 
radio sources to host galaxies: in the next few years, LOFAR 
will produce such wide-area, sensitive, high-angular resolu- 
tion radio surveys, if-band imaging to depths of 19 and be- 
low would be still more efficient (especially for even higher 
redshift cuts) but is extremely expensive in telescope time, 
and is impractical to be carried out over the large areas nec- 
essary to locate significant numbers of high redshift radio 
AGN. 



7.4 Optimal Search Criteria for High-z Radio 
Galaxies 

Many combinations of cuts using the if-band magnitude, 
angular size and spectral index have been utilised in the 
literature, but as yet there have been no investigations into 
the most efficient combination of these for selecting high- 
z radio galaxies. As we have shown previously, there are 
some correlations present between D, a and z in flux limited 
samples, in addition to the well known if-z relation. 

We therefore test whether fitting a simple relation to 
these observed parameters would enable a more efficient se- 
lection to be made. We fitted a function firstly to angular 
size and observed spectral index (i.e. a radio-only selection 
method), and then to angular size, spectral index and if- 
band magnitude (just for the radio galaxies, c.f. previous 
section) as follows: 

log{\ + z) = ailog9 + a^a + az (1) 

log(l + z) = ai(if-18) 2 +a 2 (if- 18) + a 3 log9 + a,4,a + a 5 (2) 

Having obtained best-fit parameters, we used these two 
relations to derive a predicted redshift, z p , for each source 
(see Table [S]). The results of this can be seen in Figure 
1111 The combination of radio observables does far better 
than fitting only one single radio parameter (spectral index) 
alone, whereas in contrast, the addition of radio variables 
to the if-band function provides little discernable improve- 
ment over fitting if-band magnitudes alone. Applying these 
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Figure 11. Each panel plots the predicted redshift versus the actual redshift for CENSORS (dots) and 7CRS (squares), for three different 
relations. The left panel plots the predicted redshift from a simple linear fit to spectral index, and the middle the predicted redshift from 
fitting z as a function of 6 and a as described in the text. It is immediately clear that the combination of both radio observables is much 
more effective in predicting the source redshift. The panel on the right displays the predicted redshift from fitting z as a function of K, 
9 and a, a relation which offers little improvement over a simple fit to K magnitude alone. Plotted in red stars are the highest redshift 
radio galaxies known with K, 6 and a data available, all of which would be successfully singled out using a z pr edict(K,9,a) 



Table 8. The fits to z(0,a) and z(K,0,a) for 7CRS and CENSORS as described in the text. 



Sample 




Function 




"-(Zobs-Zp) 


ai 


a 2 


»3 




a 5 


7CRS 


log(l+z) 


=ailog0+a2a+a3 


-0.26 


0.64 


-0.1213 


-0.5840 


0.045 






CENSORS 


log(l+z) 


=ailog0+a2a+a3 


-0.45 


0.88 


-0.1687 


-0.1973 


0.40725 






CENSORS 


log(l+z) 


=ai(_ff-18) 2 +a 2 (-ft'-18)+a3log0+a4a+a 5 


-0.08 


0.40 


0.011 


0.11287 


-0.0497 


0.0024 


0.4074 


7CRS 


log(l+z) 


=ai(K-18) 2 +a 2 (A:-18)+a3loge+a4a+a 5 


-0.12 


0.47 


0.004 


0.1025 


-0.003 


-0.1327 


0.2963 



findings, we then repeated the analysis of the previous sub- 
sections by applying increasing predicted redshift cuts from 
these two relations. The results are shown in Figures 12 and 
13. Whilst the z(8,a) relation does not give a perfect fit to 
the data (see Table [8]), applying cuts based on the predicted 
redshifts results in a substantially higher efficiency than any 
one radio variable cut alone (see Figure [12]) . The z(9, a) fit is 
less efficient for the high frequency sample CENSORS than 
for the low frequency selected 7CRS, as we would expect 
from the findings of the preceding subsections. The z(K,9, 
a) fit appears equally efficient for both (note that 7CRS falls 
off more quickly in Figure 1131 as it contains far less sources 
above z=3 than CENSORS), but on comparison with a sim- 
ple K magnitude fit (see Figurc fTTJ)) . any improvement is very 
marginal. 

As a final test, we also calculate z(9, a) and z(K,8, a) 
for the nine highest redshift radio sources known (see Figure 
lll[) . For all of these sources, the z(K,8,a) and z(8,a) relations 
predict high redshifts, z>2, which if we applied as cuts to a 
complete sample of radio galaxies, would leave only a very 
small proportion of the original sample. 



8 CONCLUSIONS 

The main conclusions of this paper are: 

• The strongest independent relation measured in both 



high and low frequency selected samples, excluding the P-z 
correlation (which is a selection effect) is between D and a 

• The observed z-a correlation reaches maximum 
strength for an observed a measured at high frequencies, 
in a low frequency selected sample. However this correlation 
is weak in comparison to the other observed correlations be- 
tween a-T) and D-z. 

• Up to 50% of the measured z-a gradient can be con- 
tributed by a k-correction, in both high and low frequency 
selected samples. This is important as almost all known z>4 
galaxies display curvature in their spectrum. 

• Selecting high redshift (z>2) sources based only on their 
observed a provides only a small increase in searching effi- 
ciency for low frequency selected samples, and almost none 
for high frequency selected samples. Table [9] displays the 
fraction of the samples that have z>2 and z>3 for a selec- 
tion of the observational cuts studied. 

• Whilst we confirm the presence of a z-a correlation for 
extended classical radio galaxies, if it arises as a result of ra- 
dio lobes working against an increasingly denser IGM, giving 
a steeper spectrum, we caution that this may not be as use- 
ful at the very highest redshifts. The very highest z>4 known 
radio sources present observational characteristics which are 
more consistent with being young radio sources, still con- 
fined within their host galaxies. 

• AT-band selection is very much more efficient than radio- 
based selection to maximise the number of high-z galaxies 
selected whilst minimising the total sample size. Recent ex- 
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Figure 12. The bottom panel shows the fraction of all sources 
(solid lines), and the fraction of z>2 (dashed) and z>3 (dotted) 
radio galaxies that have larger predicted redshifts z(8, a) than the 
given limit, as a function of that limit, for both CENSORS (black) 
and 7CRS (red) samples. The top panel displays the fraction of 
high-z radio galaxies in the sample recovered by these cuts. Note 
that the z>3 lines are much more uncertain, due to the relatively 
low numbers of these. 



Figure 13. The bottom panel shows the fraction of all radio 
galaxies (dashed lines), and the fraction of z>2 (dashed) and 
z>3 (dotted) radio galaxies that have larger predicted redshifts 
z(K,8,a) than the given limit, as a function of that limit, for both 
CENSORS (black) and 7CRS (red) samples. The top panel dis- 
plays the fraction of high-z radio galaxies in the sample recovered 
by these cuts. Note that the z>3 lines are much more uncertain, 
due to the relatively low numbers of these. 



isting surveys such as UKIDSS Large Area Survey are just 
deep enough to enable efficient searches. 

• Searching based on a combination of criteria, such as 
near-infrared magnitude, radio spectrum, and 6 provides op- 
timal searching efficiency for all types of radio source at high 
redshift. 

The key finding of this paper is that the efficiency of the 
Ultra Steep Spectrum criterion alone in selecting the highest 
redshift radio galaxies is not as robust as has sometimes 
been implied in the literature. We do see a z-a correlation, 
but it is weak, and the intrinsic scatter in a dominates. 
The z-a correlation is strongest for extended sources, which 
is consistent with the interpretation of radio lobes growing 
into a denser IGM as redshift increases. 

The strongest correlation which we observe in the data, 
between D and a can be easily understood: as the sources 
grow, they age and the spectrum grows steeper. In addi- 
tion to this, as a result of synchrotron self-absorption, young 
sources generally have a turn-over in their spectra (e.g. GPS, 
CSS sources) which gives rise to a flatter spectrum. These 
sources are usually small, being recently triggered, and often 
still propagating through the host galaxy. These small, flat 
sources again contribute to the strong observed correlation 
between D and a. 

These young sources also contribute to the strong cor- 
relation observed between z and D, where sources are on 
average smaller at higher redshifts. This may be understood 
in the context of the 'youth-redshift degeneracy' outlined by 



iBlundell fc Rawlingsl (I1999T I. Their argument is that sources 
at high redshifts are increasingly likely to be young, and 
hence smaller, because radio sources fade as they grow in 
size due to the decreasing ambient density, and any flux- 
limited sample selects only the most luminous sources at 
high redshift. The degeneracy is most pronounced over a lu- 
minosity range where the luminosity function is steep (i.e. 
above the break) and hence is typically stronger at high 
redshift than low redshift for current flux limits. In higher 
frequency samples, the degeneracy may be enhanced further, 
as synchrotron losses lead to a faster drop in the luminosity 
with age. Identifying high redshift candidates in the radio 
regime requires a sufficiently young source that synchrotron 
and inverse compton losses have not yet had time to de- 
plete the rest-frame GHz part of the spectrum, making the 
source too faint to be included. This D-z relation has im- 
plications for the z-a correlation, in that as we move out 
to higher and higher redshifts, we will eventually reach a 
regime where radio sources are mostly ~host galaxy sizes. 
The association of a significant fraction of Infrared Faint Ra- 
dio Sources (which are radio sources without an optical or 
infrared identification, and hence potentially high redshift 
can didates, but often not with an Ultra -Steep Spectrum; 
e.g. IJarvis et all |200|)), as CSS sources (jMiddelberg et ahl 
l201lh offers further support for this. Some CSS sources are 
very luminous, and can display observed spectral indexes 
of a steepness co mparable to Ultra Steep Spectrum sources 
(cf. lO'Dea|[l99l ), which would be expected as the source is 
expanding through the dense medium of its host. If sources 
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Table 9. The fraction of high-z sources in samples with various observational cuts applied. Column 1 displays the samples used, the 
second and third columns display the fractions of z>2, z>3 radio galaxies in the whole sample, and the fourth column the observational 
cut (in spectral index, size, K-band, or predicted redshift from a combination of these) applied to the sample(s) used. Column 5 displays 
the total fraction of the sample(s) that is returned by applying the observational cut, and the final two columns display the fractions of 
z>2, z>3 radio galaxies in the returned sample. 



Samples 


% of Whole 


% of Whole 


Observational 


% of Whole 


% of Retained 


% of Retained 


% of Sources 


% of Sources 


used 


Sample 


Sample 


Cut Applied 


Sample 


Sample 


Sample 


at z>2 


at z>3 




at z>2 


at z>3 




Retained 


at z>2 


at z>3 


Lost by Cut 


Lost by Cut 


CEN,Her 


13% 


4% 


a<-1.0 


10% 


20% 


8% 


62% 


50% 


7C,TOOT 


12% 


4% 


OK-1.0 


20% 


33% 


6% 


68% 


83% 


CEN,Her 


13% 


4% 


6»<10 


55% 


20% 


6% 


19% 


17% 


7C,TOOT 


12% 


4% 


6»<10 


30% 


20% 


4% 


47% 


67% 


CEN 


15% 


5% 


z p (a,6»)>2 


25% 


30% 


13% 


47% 


20% 


7C 


11% 


3% 


z p (a,0)>2 


15% 


35% 


12% 


54% 


33% 


CEN 


18% 


6% 


K>19 


20% 


83% 


30% 


8% 


0% 


7C 


21% 


5% 


K>19 


10% 


70% 


29% 


62% 


33% 


CEN 


18% 


6% 


z p (K,a,0)>2 


20% 


80% 


28% 


8% 


0% 


7C 


21% 


5% 


z p (K,a,0)>2 


20% 


62% 


16% 


38% 


33% 



are still propagating through the host galaxies, as opposed to 
the IGM, this may change the nature of the z-a correlation 
at high redshifts, as CSS/GPS sources have a self-absorbed 
(peaked) radio spectrum. Such sources may be selected on 
an USS spectral index in the GHz regime, but not at lower 
frequencies. Table and Figure [9] both suggest that the 
fraction of young CSS sources gets higher at high redshifts. 

Radio-based techniques could be expanded to compare 
cand idate source sizes w ith the location of the spectral peak 
(e.g. iFalcke et al]|2004 ), as well as the radio spectral shape. 
Especially in combination with existing and up-coming deep, 
widefield optical and near-infrared data, next generation in- 
struments such as LOFAR and the SKA will provide the 
crucial high resolution and sensitivity across a wide spec- 
tral range necessary to do this, and in conjunction with up- 
coming high frequency wide area surveys such as WODAN 
l|Rottgering et al]|201lh ■ will enable very good high redshift 
radio source candidates to be successfully located. 
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